product key
Large Memory Layers with Product Keys
This paper introduces a structured memory which can be easily integrated into a neural network. The memory is very large by design and significantly increases the capacity of the architecture, by up to a billion parameters with a negligible computational overhead. Its design and access pattern is based on product keys, which enable fast and exact nearest neighbor search. The ability to increase the number of parameters while keeping the same computational budget lets the overall system strike a better trade-off between prediction accuracy and computation efficiency both at training and test time. This memory layer allows us to tackle very large scale language modeling tasks. In our experiments we consider a dataset with up to 30 billion words, and we plug our memory layer in a state-of-the-art transformer-based architecture. In particular, we found that a memory augmented model with only 12 layers outperforms a baseline transformer model with 24 layers, while being twice faster at inference time. We release our code for reproducibility purposes.
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Reviews: Large Memory Layers with Product Keys
UPDATE: Authors answered my questions, I would like to keep my score unchanged and suggest to focus on clarity of the final version. Perhaps, this is the case when I would really be interested in looking at the source code. Originality: the paper borrows the general idea of product keys from the database community, however the application to fast retrieval in neural memory systems seems quite novel to me. Quality: The core ideas of the paper are sound, however more I would appreciate more rigor in both conceptual and experimental comparison with other approaches incorporating memory to Transformer (see e.g. Another suggestion would be to discuss more the issue of potential non-uniformity of the query distribution, which indeed seems to be quite relevant.
Reviews: Large Memory Layers with Product Keys
There exists some disagreement among the reviewers of this paper. Two of the reviewers believe that the introduction of the memory layer with product keys and its incorporation into the Transformer architecture is novel, interesting, and can open doors to new usecases for efficient memory-augmented neural nets. The other reviewer believes that the memory layer is merely an implementation detail, which is not proved useful for applications other than large-scale language modeling. I believe that the contributions of the paper are significant enough to grant an acceptance, especially given how important language modeling has become in modern NLP. Furthermore, because the proposed architecture is very different from common NLP and Computer Vision architectures, I recommend acceptance as a spotlight.
Large Memory Layers with Product Keys
This paper introduces a structured memory which can be easily integrated into a neural network. The memory is very large by design and significantly increases the capacity of the architecture, by up to a billion parameters with a negligible computational overhead. Its design and access pattern is based on product keys, which enable fast and exact nearest neighbor search. The ability to increase the number of parameters while keeping the same computational budget lets the overall system strike a better trade-off between prediction accuracy and computation efficiency both at training and test time. This memory layer allows us to tackle very large scale language modeling tasks.
Large Memory Layers with Product Keys
Lample, Guillaume, Sablayrolles, Alexandre, Ranzato, Marc', Aurelio, Denoyer, Ludovic, Jegou, Herve
This paper introduces a structured memory which can be easily integrated into a neural network. The memory is very large by design and significantly increases the capacity of the architecture, by up to a billion parameters with a negligible computational overhead. Its design and access pattern is based on product keys, which enable fast and exact nearest neighbor search. The ability to increase the number of parameters while keeping the same computational budget lets the overall system strike a better trade-off between prediction accuracy and computation efficiency both at training and test time. This memory layer allows us to tackle very large scale language modeling tasks.
Microsoft adds two more Dynamics 365 AI apps to its roadmap ZDNet
Microsoft is adding more AI-infused Dynamics 365 applications to its line-up. In addition toe the previously announced Dynamics 365 AI for Sales app, Microsoft also is introducing a Dynamics 365 AI for Customer Service app and a Dynamics 365 AI for Market Insights app this fall, officials said on September 18. In July this year, Microsoft made publicly available its 238-page release note document for its coming October 2018 wave of Dynamics 365 ERP and CRM applications. The October 2018 releases will include more than 100 incremental updates to the core Dynamics Sales, Marketing, Customer Service, Portals, Omni-channel Engagement Hub, Field Service,Project Service, Social Engagement, Finance and Operations, Talent, Retail, and Business Central products and services. Officials reiterated today that Microsoft will start rolling out its October 2018 Dynamics 365 and Power platform deliverables generally on October 1. Also: Microsoft's got a new plan for managing Windows 10 devices Back in July, Microsoft officials also demonstrated the coming Dynamics 365 AI for Sales app, which they said would be available in public preview form in October 2018.